An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients
نویسندگان
چکیده
Deep reinforcement learning methods have shown tremendous success in a large variety tasks, such as Go [Silver et al., 2016], Atari [Mnih et al., 2013], and continuous control [Lillicrap et al., 2015, Schulman et al., 2015]. Policy gradient methods [Williams, 1992] is an important family of methods in model-free reinforcement learning, and the current state-of-the-art policy gradient methods are Proximal Policy Optimization ( Schulman et al. [2017]) and ACKTR [Wu et al., 2017]. The two methods, however, take different approaches to better sample efficiency: PPO considers a particular “clipping” objective that mimics a trust-region, whereas ACKTR considers approximated natural gradients that balances speed and optimization.
منابع مشابه
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronec...
متن کاملA Kronecker-factored approximate Fisher matrix for convolution layers
Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present K...
متن کاملOptimization of thermal curing cycle for a large epoxy model
Heat generation in an exothermic reaction during the curing process and low thermal conductivity of the epoxy resin produces high peak temperature and temperature gradients which result in internal and residual stresses, especially in large epoxy samples. In this paper, an optimization algorithm was developed and applied to predict the thermal cure cycle to minimize the temperature peak and the...
متن کاملTaking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control
In this work we introduce the application of black-box quantum control as an interesting reinforcement learning problem to the machine learning community. We analyze the structure of the reinforcement learning problems arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a general method to solving...
متن کاملOptimization and sound absorption modeling in Yucca Gloriosa natural fiber composite
Introduction: Nowadays, the acoustic behavior analysis of natural fibers composites has received increasing attention by researchers. In this regard, the present study aimed to optimize and model the sound absorption behavior of composites made of Yucca Gloriosa (YG) fiber via using a mathematical modeling approach. Methodology: In this experimental cross-sectional study, in order to fabricate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.05566 شماره
صفحات -
تاریخ انتشار 2018